Integrative genetic and single cell RNA sequencing analysis provides new clues to the amyotrophic lateral sclerosis neurodegeneration

Introduction The gradual loss of motor neurons (MNs) in the brain and spinal cord is a hallmark of amyotrophic lateral sclerosis (ALS), but the mechanisms underlying neurodegeneration in ALS are still not fully understood. Methods Based on 75 ALS-pathogenicity/susceptibility genes and large-scale single-cell transcriptomes of human/mouse brain/spinal cord/muscle tissues, we performed an expression enrichment analysis to identify cells involved in ALS pathogenesis. Subsequently, we created a strictness measure to estimate the dosage requirement of ALS-related genes in linked cell types. Results Remarkably, expression enrichment analysis showed that α- and γ-MNs, respectively, are associated with ALS-susceptibility genes and ALS-pathogenicity genes, revealing differences in biological processes between sporadic and familial ALS. In MNs, ALS-susceptibility genes exhibited high strictness, as well as the ALS-pathogenicity genes with known loss of function mechanism, indicating the main characteristic of ALS-susceptibility genes is dosage-sensitive and the loss of function mechanism of these genes may involve in sporadic ALS. In contrast, ALS-pathogenicity genes with gain of function mechanism exhibited low strictness. The significant difference of strictness between loss of function genes and gain of function genes provided a priori understanding for the pathogenesis of novel genes without an animal model. Besides MNs, we observed no statistical evidence for an association between muscle cells and ALS-related genes. This result may provide insight into the etiology that ALS is not within the domain of neuromuscular diseases. Moreover, we showed several cell types linked to other neurological diseases [i.e., spinocerebellar ataxia (SA), hereditary motor neuropathies (HMN)] and neuromuscular diseases [i.e. hereditary spastic paraplegia (SPG), spinal muscular atrophy (SMA)], including an association between Purkinje cells in brain and SA, an association between α-MNs in spinal cord and SA, an association between smooth muscle cells and SA, an association between oligodendrocyte and HMN, a suggestive association between γ-MNs and HMN, a suggestive association between mature skeletal muscle and HMN, an association between oligodendrocyte in brain and SPG, and no statistical evidence for an association between cell type and SMA. Discussion These cellular similarities and differences deepened our understanding of the heterogeneous cellular basis of ALS, SA, HMN, SPG, and SMA.

Although MN functions have been implicated, the mechanisms of neurodegeneration in ALS are still not fully understood (van Es et al., 2017). Moreover, the roles of two distinct subpopulationsα-MNs and γ-MNs-in ALS pathogenesis are not fully elucidated. α-MNs located in the spinal cord innervate extrafusal muscle fibers, which create force to move the skeleton. In contrast, γ-MNs innervate intrafusal fibers, which modulate the sensitivity of muscle spindles to stretch (Burke et al., 1977). A comparison of αand γ-MNs in the spinal cord showed different behavior in ALS (Ragagnin et al., 2019), indicating that these two MNs subpopulations are not affected equally in ALS pathogenesis, raising the question if genetics contributes to the different roles of MNs subpopulations in ALS. Although several genes were found to be associated with familial and sporadic ALS, few studies have addressed possible genetic differences distinguishing familial and sporadic ALS and the cellular basis governing pathogenicity and susceptibility of ALS.
Recent advances of single-cell RNA-sequencing have provided an accurate and comprehensive depiction of cell types and gene expression. On the basis of cell type identification and diseases-related gene expression, previous studies (Skene et al., 2018;Bryois et al., 2020) have revealed various brain cell types involved in neurological disorders, including associations between monoaminergic neurons and neurodegenerative diseases, associations between embryonic GABAergic neurons and neurodevelopmental diseases, and associations between projecting excitatory neurons and psychotic disorders. Our previous study also showed that serotonergic neurons are involved in the etiology and therapy-genetics of anxiety disorders (Liu et al., 2021).
Here, using the same strategy, we applied an empirical method named expression weighted cell type enrichment (Skene and Grant, 2016) (EWCE) to investigate the cellular basis of ALS and four neurological disorders with phenotypic overlap, including hereditary motor neuropathies (HMN), spinocerebellar ataxia (SA), hereditary spastic paraplegia (SPG), spinal muscular atrophy (SMA). Our results suggested that MNs in the spinal cord play a role in ALS, SA, and HMN; additionally, MNs subpopulationsαand γ-MNs-are mainly linked to the susceptibility and pathogenicity of ALS.

Single-cell transcriptome datasets
We employed six large-scale single-cell transcriptome datasets from recent studies (Zeisel et al., 2015;Rosenberg et al., 2018;De Micheli et al., 2020;Supplementary Table 4). The first dataset was generated by Rosenberg et al. (2018). Their study analyzed 156,049 mononuclear transcriptomes in mouse brains and spinal cords and classified these cells into 73 cell types via an unsupervised clustering method. The second dataset was generated by De Micheli et al. (2020). The research team integrated 22,058 single-cell transcriptomes in human muscles (e.g., right serratus, left flexor hallucis longus, orbicularis oris, eye lid, left vastus lateralis, left external oblique, left rectus abdominus, trapezius, right external oblique, right flexor hallucis longus) and resolved 16 distinct populations of muscle-resident cells. The third dataset was generated by Milich et al. (2021). Their study generated 66,178 cells from uninjured and injured (Briefly, mice were anesthetized and received a 65-kilodyne mid-thoracic contusion injury) mouse spinal cord. Cluster analysis of these cells resulted in 15 distinct clusters. The fourth dataset was generated by Sathyamurthy et al. (2018). Their study analyzed 17,354 nuclei from adult mouse lumbar spinal cord and founded seven major clusters. The fifth dataset was generated by Andersen et al. (2021). Their study obtained transcriptomes of 112,554 cells and 34,884 nuclei from four samples of human spinal cord and indicated the cellular landscape of the human spinal cord, including α and γ MNs. The sixth dataset with 9,970 cells was assembled by Skene et al. (2018) from single-cell transcriptome datasets (Dueck et al., 2015;Saraiva et al., 2015;Usoskin et al., 2015;Zeisel et al., 2015). These cells are distributed in various mouse brain regions and were classified into 24 brain cell types. These datasets were released by the authors and accessed from the Gene Expression Omnibus database.
Datasets 1-2 containing single-cell transcriptomes of mouse brain, mouse spinal cord, and human muscle were used for discovery; Datasets 3-4 were used for replicating the associations that were founded in mouse spinal cord. Dataset 5 is used for validating the associations of mouse in human spinal cord. The reason we did not used it for discovery is that the dataset 5 is provided by a preprint study. Dataset 6 was used for validating the associations that were found in mouse brain. Since no cell type annotations of MNs and subpopulations were included in the datasets 3-4, we annotated MNs and subpopulations via the SingleR (Aran et al., 2019) R-package 1 with reference to the spinal cord (Rosenberg et al., 2018) (dataset 1).

Expression weighted cell type enrichment
The EWCE (Skene and Grant, 2016) method has been demonstrated to be reliable for studying the expression specificity across various cell types with single-cell transcriptomes. We employed the EWCE R-package 2 to investigate the cell-type expression specificity of ALS-related genes. Firstly, we used the generate-celltype-data function to calculate the specificity of genes in each cell type. Subsequently, we used the bootstrap-enrichmenttest function to estimate the P-value of specificity of target genes. The bootstrap method randomly samples 10,000 lists of genes with the same number of target genes from all the genes. The specificity of these 10,000 lists of genes was used as background distribution. P-values of specificity of target genes were calculated by the cumulative density function of the specificity distribution and adjusted by the false discovery rate (FDR) method. Statistical significances differed by genes number were estimated by randomly selecting the genes number (5-32) from ALS-pathogenicity genes and exhibited in Supplementary Table 5.

Dosage requirement in related cells
We developed a strictness measure in this study to estimate the dosage requirement for a given gene. Strictness was calculated by the standard deviation of fold change: , in which S refers to strictness, C refers to fold change, i refers to the ith cell, and n refers to the total number of cells. The fold change was calculated by the expression in one cell divided by the mean expression in all cells: C i E i /E, in which E refers to the expression in one cell, E refers to the mean expression, i refers to the ith cell. A high strictness value indicates that a gene is required to have strict expression. A low strictness indicates that the gene expression is tolerant to alterations.
To calculate the significance of strictness for target genes, we developed a method based on the Central Limit Theorem. Central Limit Theorem states that the distribution of the sample means will be approximately normal distribution. Firstly, we have a number (n) of target genes that we want to study, and then we calculate the strictness mean for these n genes (x). Subsequently, we take a sample with n genes from all the genes randomly and repeat the sampling 10, 000 times via a bootstrap method. We calculate the strictness means for each of the 10,000 random samples and then estimate the mean (µ) and the standard deviation (σ) of the distribution via a maximum likelihood estimation. The distribution of sample means should be approximately normal: XN(µ, σ 2 ). Finally, the P-value of the target genes mean is calculated  as: dx. P-values are adjusted by the Bonferroni's method. We employed the three gene sets described below for evaluating the performance of the strictness measure.

Haploinsufficient genes
We accessed 299 known haploinsufficient (HI) genes from Dang et al. (2008). We retained 49 HI genes that are related to neurodegenerative and/or neurodevelopmental diseases (i.e., Alzheimer's disease, Parkinson's disease, Huntington's disease, SA, multiple system atrophy, epilepsy, autism spectrum disorder, and schizophrenia) (Supplementary Table 6). These 49 genes were used as a positive control for evaluating the strictness measure.

Loss-of-function tolerant genes
We accessed 330 putative homozygous loss-of-function tolerant (LoFT) genes from Lek et al. (2016) Table 6). These genes contain at least two different high confidence loss-of-function (LoF) variants that were found in a homozygous state in at least one individual in the ExAC database. These 330 genes were used as a negative control of HI genes.

Non-neurodegeneration-orneurodevelopment diseases-related genes
We accessed 1,189 genes related to non-mental-health diseases from Krishnan et al. (2016). These genes were identified from OMIM and used as negative genes for autism spectrum disorder-related genes. We reviewed these 1,189 genes and then excluded the genes related to neurodegenerative and/or neurodevelopmental diseases, including Alzheimer's disease, Parkinson's disease, Huntington's disease, SA, multiple system atrophy, epilepsy, autism spectrum disorder, and schizophrenia. We retained 1,113 genes related to non-neurodegeneration-orneurodevelopment diseases (NNN) as another negative control of HI genes (Supplementary Table 6).

Data and code availability
Genes related to ALS, HMN, SA, SPG, and SMA are listed in Supplementary Tables 1-3. Single-cell transcriptome datasets are listed in Supplementary Table 4. HI genes, LoFT genes, and NNNrelated genes are listed in Supplementary Table 6.
The code for investigating cell-type specificity and gene expression strictness is written in R-program and is released at GitHub. 3

Ethics approval
This study was reviewed and approved by the Ethics Review Committee at Hebei Medical University and was performed at Hebei Industrial Technology Research Institute of Genomics. No participant or donor was involved in our study.

The involvement of motor neurons in ALS
We collected 32 ALS-pathogenicity genes from the OMIM database and 48 ALS-susceptibility genes from recent large-scale genome-wide association studies. There are five genes (C9orf72, KIF5A, NEK1, SOD1, and TBK1) related to the pathogenicity and susceptibility of ALS ( Figure 1A). To address the cellular basis related to different genetic impacts, we employed the EWCE method developed by Skene and Grant (2016). The EWCE R-package was used to calculate the cell-type specificity of the pathogenicity and susceptibility ALS genes in the singlecell transcriptome data. No neurons in brain were found to be associated with ALS-pathogenicity genes or ALS-susceptibility genes (Figure 2A). Surprisingly, α-MNs and γ-MNs in the spinal cord were associated with ALS-susceptibility genes and ALSpathogenicity genes (Figure 2B), suggesting different involvements of MNs subpopulations in ALS. To confirm our findings, we employed two independent single-cell transcriptome datasets of mouse spinal cord and annotated the cell types using the SingleR package with a reference from Rosenberg et al. (2018). We replicated the specific associations of susceptibility/pathogenicity genes and α-/γ-MNs (Supplementary Table 7). A significant association (P-value = 0.05) between susceptibility genes and α-MNs, as well as a significant association (P-value = 0.03) between pathogenicity genes and γ-MNs, was observed in the replication data of mouse spinal cord data (Milich's dataset). Suggestive associations (P-value of susceptibility genes and α-MNs = 0.08; P-value of pathogenicity genes and γ-MNs = 0.075) were observed in the replication data of mouse spinal cord data (Sathyamurthy's dataset). We also validated these associations in a single-cell transcriptome dataset of human spinal cord and observed a significant association (P-value = 0.039) between susceptibility genes and α-MNs, as well as a significant association (P-value = 0.015) between pathogenicity genes and γ-MNs. The replications in mouse dataset and validation in human dataset demonstrated the robustness of our findings. We combined the statistical summary of three independent single-cell transcriptome datasets of mouse spinal cord and one single-cell transcriptome dataset of human spinal cord via meta-analysis (Willer et al., 2010) and used the total number of cell types (56) Table 7) confirmed γ-MNs associated with ALS-pathogenicity (Pvalue = 6.00 × 10 −7 ) and α-MNs associated with ALS-susceptibility (P-value = 6.86 × 10 −6 ).

Cellular differences of ALS, HMN, SA, SPG, and SMA
Besides ALS, we included four neurological disorders-HMN, SA, SPG, and SMA-with phenotypic overlaps and collected their pathogenicity gene sets. Except for five of the HMNpathogenicity genes shared with SMA, there are few overlaps between each pair among ALS, HMN, SA, SPG, and SMA ( Figure 1B). These results showed potential different contributions of cell types to disease pathophysiology. Analysis of cell typespecific expression of genes related to HMN, SA, SPG, and SMA provided insights into their cellular basis: an association between Purkinje cells in brain and SA, an association between α-MNs in spinal cord and SA, an association between smooth muscle cells and SA, an association between γ-MNs in spinal cord and HMN, an association between oligodendrocyte and HMN, a suggestive association between mature skeletal muscle and HMN (P-value = 0.052), associations between oligodendrocyte subtypes and SPG (Figures 2A-C). The associations of MN/oligodendrocyte in spinal cord/brain were confirmed in replicated/validated datasets (Supplementary Table 8). Similar to ALS, MNs were associated with SA and HMN. To our knowledge, SA and HMN both present ALS-like phenotypes (Anand et al., 2014;Garcia-Santibanez et al., 2018) and MNs degeneration (Ikeda et al., 2012;Beijer and Baets, 2020) is involved in their pathogenesis. Oligodendrocyte is also involved in ALS, a cell type that was shown to induce hyperexcitability and death in mutant SOD1 mouse (Ferraiuolo et al., 2016). Oligodendrocytes in both brain and spinal cord are associated with SPG, consistent with a previous report (Edgar et al., 2004). Oligodendrocytes are associated with HMN. Besides the cellular basis in brain and spinal cord, smooth muscle cells are associated with SA, skeletal muscle cells are associated with HMN. No evidence was observed for an association of SMA, even though SMA and HMN share five pathogenicity genes. To provide a clear image of the connections, we linked the diseases and cell types via a Sankey diagram (Figure 2D). These similarities and differences may provide insights in the etiology of ALS, HMN, SA, SMA, and SPG.

The strict dosage requirement of ALS-susceptibility genes in MNs
Here we have revealed that ALS-related genes are specifically expressed in MNs, however, how these genes affect ALS reminds unclear. To provide insight into the mechanism, we developed a measure called strictness to evaluate the dosage requirement in single-cell populations. We hypothesized that if a gene is dosagesensitive, it will have a narrow range of expression in normal cells and present a higher strictness measure. For example, ANK1 was identified as a dosage-sensitive gene (Dang et al., 2008). The LoF variant in ANK1 causes haploinsufficiency and results in autism spectrum disorder (Yang et al., 2019). The Ank1 strictness measure in mouse brain cells is in the top strictness decile. In contrast, the FAM187B expression is tolerant to be altered because it harbors a common stop-gained variant (MacArthur et al., 2012). There are 43.6% of individuals who have a heterogeneous LoF variant and 34.7% of individuals who have homogeneous LoF variant in the gnomAD (Karczewski et al., 2020) database. The Fam187b strictness measure in mouse brain cells is in the bottom strictness decile (Figure 3A). To evaluate the performance of the strictness measure, we employed three gene sets with known expression patterns in the brain. Dosage-sensitive genes involved in brain functions are expected to have high strictness in brain cells. We collected 49 HI genes related to neurodegenerative diseases and/or neurodevelopmental diseases as a positive control, 300 LoFT genes, and 1,113 NNN disease-related genes as negative controls for the absence of dosage-sensitivity or brain functions, respectively. HI genes exhibited significant high strictness in brain and spinal cord. In contrast, LoFT genes and NNN disease-related genes exhibited low strictness ( Figure 3B). These results demonstrated that strictness is a robust measure of dosage-sensitive genes.
Subsequently, we calculated the strictness of ALS-related genes in spinal cord cell types and showed that the ALSsusceptibility genes present significant high strictness in α-MNs (Figure 3C), indicating that the ALS-susceptibility genes are dosage-sensitive and the LoF of genes may be a mechanism of ALS. The P-value of the ALS-pathogenicity gene strictness in γ-MNs is close to the significant threshold, indicating that there may be different mechanism among these genes. We reviewed the ALS-pathogenicity genes on the basis of animal model and classified these genes into three categories: LoF, gain of function (GoF), and unknown (Supplementary Table 1). We showed that LoF ALS-pathogenicity genes are strictly required in γ-MNs. In contrast, no statistical evidence for supporting the hypothesis that the GoF genes or the genes in unknown category are dosagesensitive ( Figure 3D). These results are consistent with our hypothesis that LoF genes have high strictness and GoF genes have low strictness.
Besides, ALS-susceptibility genes and LoF pathogenicity genes exhibited high strictness in astrocytes and oligodendrocytes, both these two cell types were shown to play important roles in ALS (Ferraiuolo et al., 2016;Stoklund Dittlau et al., 2023). Excitatory, inhibitory, cerebrospinal fluid contacting neurons also required strict expressions of ALS-susceptibility genes. Abnormal cerebrospinal fluid contacting neurons (Ng Kee Kwong et al., 2020), imbalance between excitatory and inhibitory (Foerster et al., 2013;Cavarsan et al., 2022) were reported to contribute to ALS.

Discussion
Our study revealed several cell types associated with ALS and three additional neurological disorders, HMN, SA, and SPG. Most of the related cell types have been demonstrated to be important in the related diseases (Skene et al., 2018;Bryois et al., 2020) [e.g., Purkinje cells (Kasumu and Bezprozvanny, 2012;Xia et al., 2013;Ishida et al., 2016) in SA, oligodendrocyte (Edgar et al., 2004) in SPG]. Notably, MNs are linked to ALS: α-MNs are associated with ALS-susceptibility genes and γ-MNs are associated with ALS-pathogenicity genes. Although the degeneration of MNs is demonstrated to cause ALS and neuromuscular disorders, the pathogenicity of MNs subpopulations is less known. The different roles of MNs subpopulations are consistent with a previous study (Lalancette-Hebert et al., 2016) that showed different vulnerabilities of MNs subpopulations: α-MNs are selectively degenerated and γ-MNs are completely spared in an ALS mutant mouse model. These results suggested that αand γ-MNs do not play equal roles in ALS. Besides MNs, we found no statistical evidence for an association between muscle cells and ALS-related genes ( Figure 2C). To our knowledge, MN death is the core event of ALS pathology (Anakor et al., 2022), however, the disruption of the neuromuscular junction is an early event in ALS pathology (Cappello and Francolini, 2017). Skeletal muscle metabolic dysregulation and atrophy in SOD1 mutation transgenic mice (Brooks et al., 2004;Marcuzzo et al., 2011) andiPSCs (Badu-Mensah et al., 2020) derived from ALS patients harboring SOD1 mutation were suggested to play a role in ALS. The muscle atrophy in SOD1 model doesn't conflict with our result. Our study investigated the most specific cell type expressed ALS-related genes. We cannot exclude the potential connection between specific gene and other cell types. Moreover, we observed SOD1 is specifically expressed in human muscle (Supplementary Table 9). Limited to the power for detecting minor characteristics of a few genes, the P-value of gene expression specificity in muscle did not access the significant threshold after FDR. Taken together, these results deepened our understanding of ALS pathogenesis.
We showed that the ALS-susceptibility genes are dosagesensitive in MNs, as well as the ALS-pathogenicity genes with known LoF mechanism in γ-MNs. In contrast, the GoF ALSpathogenicity genes or the genes with unknown mechanism exhibited low strictness (Figures 3D, E). SOD1 is one of the GoF pathogenicity genes. Transgenic mutant SOD1 mice and rats develop characteristics that are similar to human ALS. A previous study showed that the complete absence of SOD1 in mice did not precipitate ALS-related phenotypes (Reaume et al., 1996). A low strictness value of Sod1 is consistent with the GoF mechanism ( Figure 3E). Indeed, which mechanism causes the ALS-gain-or LoF-is still not clear (Kabashi et al., 2010;Saccon et al., 2013;Mizielinska and Isaacs, 2014;Scekic-Zahirovic et al., 2016). Our study suggested the main characteristics of ALS-susceptibility genes is dosage-sensitive, highlighting the need to carefully consider the LoF mechanism in sporadic ALS. For LoF ALS-pathogenicity genes, they were shown to strictly express in γ-MNs but not α-MNs. This result may suggest γ-MNs are more vulnerable to dosage alteration of the LoF pathogenicity genes. We noticed that there is no significant difference in the vulnerability of αand γ-MNs to ALS-susceptibility genes. This result doesn't conflict with the association between ALSsusceptibility genes and α-MNs. The ALS-susceptibility genes were shown to highly express in α-MNs. Compared with other cells with lower expression, the α-MNs were largely affected by the alteration of ALS-susceptibility genes. In contrast, the strictness of SPGrelated genes was shown differences between oligodendrocytes and other un-associated cells, as well as that of ALS-pathogenicity genes and γ-MNs. Indeed, the strictness measure was designed to compare the different expression spectrums in specific cell type between LoF genes and GoF genes as a fellow independent study of cell type expression specificity analysis. This measure was demonstrated by the LoF ALS-pathogenicity genes strictly expressed in related γ-MNs and exhibited the strict expression of SPG-related genes in oligodendrocytes, SA-related genes in α-MNs, ALS-related genes in oligodendrocytes. HMN-related genes were shown no LoF characteristic in related MNs. Based on these results, strictness analysis may provide a priori understanding of the mechanisms of disease-related genes without an animal model.
Similar to ALS, MNs were associated with SA and HMN. To our knowledge, SA and HMN both present ALS-like phenotypes (Anand et al., 2014;Garcia-Santibanez et al., 2018), and MNs degeneration (Ikeda et al., 2012;Beijer and Baets, 2020) are involved in their pathogenesis. Oligodendrocytes are associated with ALS, SPG, and HMN. The links to SPG and ALS are consistent with previous reports (Edgar et al., 2004;Ferraiuolo et al., 2016). Besides, smooth muscle cells are associated with SA, and skeletal muscle cells are associated with HMN. No evidence was observed for an association of SMA, even though SMA and HMN share five pathogenicity genes. These similarities and differences may provide insights into the etiology of ALS, HMN, SA, SMA, and SPG.
Upper MNs should be considered in our further investigation. ALS patients showed loss of pyramidal tract upper MNs, specifically Betz cells (Hammer et al., 1979). The cortical connections of Betz cells are impaired prior to ALS onset (Genç et al., 2017). Betz cells were found below the surface of the cerebral cortex within layer V of the primary motor cortex and make direct connections to spinal MNs (Lemon, 2008). However, the mouse brain single-cell transcriptome datasets employed in our study did not annotate Betz cells. Betz cell specifically expressed POU3F1 gene (Machado et al., 2014) which can be used as a marker for cell identification in mouse and human. A recent study with more than 450,000 transcriptomes and epigenomes in humans, marmoset monkeys and mice showed a broadly conserved cellular makeup of primary motor cortex, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. Our discoveries in mouse spinal cord were validated in human spinal cord. These consistent results indicated data from mouse can be used for human disease investigation, providing a reliable pathway for further investigation in Betz cells and other novel cell types.
In summary, our study revealed the cellular basis of ALS, HMN, SA, and SPG and the dosage characteristic of the diseaserelated genes in specific cell types. Moreover, the methods-cell type enrichment and gene expression variability-are useful for investigating the role of genes in various cell types, which may open the door for valuable mining of public single-cell data and genetic knowledge of human diseases.

Data availability statement
The original contributions presented in this study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.